Fix/realtime tts voice rewire by atomer-nvidia · Pull Request #181 · nvidia-riva/python-clients

atomer-nvidia · 2026-05-18T13:31:08Z

No description provided.

Defer the pyaudio import to the points where it is actually needed (MicrophoneStream.__enter__, SoundCallBack.__init__, list_*_devices, get_*_info). Default WAV-output flows now work on machines without PortAudio headers installed. When pyaudio is missing, raise an ImportError that explicitly tells the user to install portaudio19-dev first, addressing the VDR finding that fresh-box users got blocked by a bare ModuleNotFoundError with no install instructions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The riva-asr/nmt/tts client scripts historically exit 0 on most error paths — including "Unavailable model", connection refused, empty/invalid input, and missing files — which causes CI pipelines composing these scripts via && chains to silently swallow real failures. Add a cli_main decorator that translates uncaught exceptions into a small, consistent set of exit codes: 2 = bad input (missing/empty file, ValueError, IsADirectoryError) 3 = gRPC UNAVAILABLE (server down, wrong port, network) 4 = gRPC INVALID_ARGUMENT / NOT_FOUND (bad model/lang/voice) 1 = anything else 130 = SIGINT The decorator also writes the error to stderr so CI logs surface the cause rather than the script swallowing it. Follow-up commit wires this into each client script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ation Address the VDR 26.02 finding that python-clients CLIs exit 0 on most error paths across all three modalities. Each script now: - Wraps main() with @cli_main so gRPC and OS errors propagate to a real exit code instead of being printed and swallowed. - Calls sys.exit(main()) so the chosen exit code reaches the shell. Script-specific fixes: scripts/nmt/nmt.py - Drop the inner request() try/except that swallowed every gRPC status; let cli_main translate it. Empty/whitespace --text and missing --text-file now return EXIT_BAD_INPUT (was: silent exit 0). Document --max-len-variation as decoder-token units with valid range [0, 256], default 20, and Arabic chunking note. scripts/tts/talk.py - Reject whitespace-only --text up front (defense-in-depth pair to the server-side fix in riva-speech that closed the hang on `--text " "`). Drop the broad `except Exception` that stringified gRPC errors and exited 0. scripts/asr/transcribe_file*.py - Replace `print(...); return` on missing input files with EXIT_BAD_INPUT. Remove the silent grpc.RpcError swallow in transcribe_file_offline.py. scripts/asr/transcribe_mic.py + realtime_asr_client.py + tts/talk.py - Pyaudio install hint now mentions `apt-get install -y portaudio19-dev` (Debian/Ubuntu) and `brew install portaudio` (macOS), pairing with the prereqs doc landed in documentation_2. scripts/tts/realtime_tts_client.py - Drop the module-level `from riva.client.audio_io import SoundCallBack` import (it was unused and pulled pyaudio in eagerly, defeating the lazy import). Drop the broad `except Exception` that mapped every failure to exit 1. scripts/nmt/nmt_speech_to_{text,speech}.py - Drop unused `import grpc`; remove the catch-all that printed "Error during translation" and exited 0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

VDR 26.02 found that realtime_tts_client.py silently ignored --voice and fell back to the server default (Mia). Tracing the WebSocket flow, the synthesize_session.update payload was built by deep-mutating the response from POST /v1/realtime/synthesis_sessions — an InitialSynthesisSessionConfig that carries id/object/client_secret fields not present in BaseSynthesisSessionConfig (the type the server validates the update against). Carrying those keys through to the override, plus the shallow .copy() + _safe_update_config nested-dict mutation, was the path that let the voice_name override fail to land on published 26.02 NIMs. Build the update payload explicitly from CLI args instead, so only fields the user actually overrode reach the server, in the exact shape documented in the SynthesisSessionUpdateMessage schema. Bump the override summary to INFO so users can see which fields were sent. After the synthesize_session.updated response, compare the server-applied voice_name and language_code against what was requested and log a WARNING on mismatch — defense-in-depth so any future server-side drop surfaces in the client log instead of as a wrong-sounding audio file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Only import parse_custom_configuration and pass custom_configuration to synthesize/synthesize_online when --custom-configuration is supplied, so talk.py keeps working against older riva-client wheels that lack the function and the kwarg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cli_main and EXIT_BAD_INPUT were added recently in argparse_utils and are not present in older riva-client wheels. Wrap their imports in a try/except across all asr/nmt/tts client scripts, falling back to a no-op decorator and EXIT_BAD_INPUT=2 so the scripts keep running against older installed wheels (only the structured exit codes are lost in that case). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix: nmt: surface EXIT_BAD_INPUT when --text-file has no non-empty lines

Aligns scripts/tts/talk.py and riva.client.SpeechSynthesisService synthesize/synthesize_online defaults with the HTTP /v1/audio/synthesize default, so the same call over either transport yields the same audio when the rate is left unspecified. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Make pyaudio an optional dependency in audio_io Defer the pyaudio import to the points where it is actually needed (MicrophoneStream.__enter__, SoundCallBack.__init__, list_*_devices, get_*_info). Default WAV-output flows now work on machines without PortAudio headers installed. When pyaudio is missing, raise an ImportError that explicitly tells the user to install portaudio19-dev first, addressing the VDR finding that fresh-box users got blocked by a bare ModuleNotFoundError with no install instructions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add cli_main decorator with structured CLI exit codes The riva-asr/nmt/tts client scripts historically exit 0 on most error paths — including "Unavailable model", connection refused, empty/invalid input, and missing files — which causes CI pipelines composing these scripts via && chains to silently swallow real failures. Add a cli_main decorator that translates uncaught exceptions into a small, consistent set of exit codes: 2 = bad input (missing/empty file, ValueError, IsADirectoryError) 3 = gRPC UNAVAILABLE (server down, wrong port, network) 4 = gRPC INVALID_ARGUMENT / NOT_FOUND (bad model/lang/voice) 1 = anything else 130 = SIGINT The decorator also writes the error to stderr so CI logs surface the cause rather than the script swallowing it. Follow-up commit wires this into each client script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Wire cli_main into asr/nmt/tts client scripts and tighten input validation Address the VDR 26.02 finding that python-clients CLIs exit 0 on most error paths across all three modalities. Each script now: - Wraps main() with @cli_main so gRPC and OS errors propagate to a real exit code instead of being printed and swallowed. - Calls sys.exit(main()) so the chosen exit code reaches the shell. Script-specific fixes: scripts/nmt/nmt.py - Drop the inner request() try/except that swallowed every gRPC status; let cli_main translate it. Empty/whitespace --text and missing --text-file now return EXIT_BAD_INPUT (was: silent exit 0). Document --max-len-variation as decoder-token units with valid range [0, 256], default 20, and Arabic chunking note. scripts/tts/talk.py - Reject whitespace-only --text up front (defense-in-depth pair to the server-side fix in riva-speech that closed the hang on `--text " "`). Drop the broad `except Exception` that stringified gRPC errors and exited 0. scripts/asr/transcribe_file*.py - Replace `print(...); return` on missing input files with EXIT_BAD_INPUT. Remove the silent grpc.RpcError swallow in transcribe_file_offline.py. scripts/asr/transcribe_mic.py + realtime_asr_client.py + tts/talk.py - Pyaudio install hint now mentions `apt-get install -y portaudio19-dev` (Debian/Ubuntu) and `brew install portaudio` (macOS), pairing with the prereqs doc landed in documentation_2. scripts/tts/realtime_tts_client.py - Drop the module-level `from riva.client.audio_io import SoundCallBack` import (it was unused and pulled pyaudio in eagerly, defeating the lazy import). Drop the broad `except Exception` that mapped every failure to exit 1. scripts/nmt/nmt_speech_to_{text,speech}.py - Drop unused `import grpc`; remove the catch-all that printed "Error during translation" and exited 0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Send override-only payload for realtime TTS session update VDR 26.02 found that realtime_tts_client.py silently ignored --voice and fell back to the server default (Mia). Tracing the WebSocket flow, the synthesize_session.update payload was built by deep-mutating the response from POST /v1/realtime/synthesis_sessions — an InitialSynthesisSessionConfig that carries id/object/client_secret fields not present in BaseSynthesisSessionConfig (the type the server validates the update against). Carrying those keys through to the override, plus the shallow .copy() + _safe_update_config nested-dict mutation, was the path that let the voice_name override fail to land on published 26.02 NIMs. Build the update payload explicitly from CLI args instead, so only fields the user actually overrode reach the server, in the exact shape documented in the SynthesisSessionUpdateMessage schema. Bump the override summary to INFO so users can see which fields were sent. After the synthesize_session.updated response, compare the server-applied voice_name and language_code against what was requested and log a WARNING on mismatch — defense-in-depth so any future server-side drop surfaces in the client log instead of as a wrong-sounding audio file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Guard TTS custom_configuration usage for backwards compatibility Only import parse_custom_configuration and pass custom_configuration to synthesize/synthesize_online when --custom-configuration is supplied, so talk.py keeps working against older riva-client wheels that lack the function and the kwarg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Guard cli_main/EXIT_BAD_INPUT imports for backwards compatibility cli_main and EXIT_BAD_INPUT were added recently in argparse_utils and are not present in older riva-client wheels. Wrap their imports in a try/except across all asr/nmt/tts client scripts, falling back to a no-op decorator and EXIT_BAD_INPUT=2 so the scripts keep running against older installed wheels (only the structured exit codes are lost in that case). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: nmt: surface EXIT_BAD_INPUT when --text-file has no non-empty lines * Default TTS sample rate to 22050 Hz to match HTTP API Aligns scripts/tts/talk.py and riva.client.SpeechSynthesisService synthesize/synthesize_online defaults with the HTTP /v1/audio/synthesize default, so the same call over either transport yields the same audio when the rate is left unspecified. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Addressing review comments --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Yuvaraj Dharavath <ydharavath@nvidia.com>

atomer-nvidia and others added 5 commits May 14, 2026 11:11

atomer-nvidia requested review from rmittal-github and virajkarandikar May 18, 2026 13:31

atomer-nvidia and others added 4 commits May 18, 2026 19:08

fix: nmt: surface EXIT_BAD_INPUT when --text-file has no non-empty lines

929ecb1

Merge pull request #1 from ydharavath/fix/realtime-tts-voice-rewire

663b82b

fix: nmt: surface EXIT_BAD_INPUT when --text-file has no non-empty lines

rmittal-github reviewed May 26, 2026

View reviewed changes

Comment thread scripts/asr/transcribe_file_offline.py

rmittal-github reviewed May 26, 2026

View reviewed changes

Comment thread riva/client/realtime.py Outdated

rmittal-github reviewed May 26, 2026

View reviewed changes

Comment thread riva/client/realtime.py Outdated

Addressing review comments

fbdf454

rmittal-github approved these changes May 26, 2026

View reviewed changes

rmittal-github merged commit 20f1a48 into nvidia-riva:main May 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/realtime tts voice rewire#181

Fix/realtime tts voice rewire#181
rmittal-github merged 10 commits into
nvidia-riva:mainfrom
atomer-nvidia:fix/realtime-tts-voice-rewire

atomer-nvidia commented May 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

atomer-nvidia commented May 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants